Outline of descriptive analysis

Introduction

Data set

UNHCR compiles official statistics on stocks and flows of forcibly displaced and stateless persons twice a year, once for mid-year figures (Mid-Year Statistical Reporting, MYSR) and once for end-year figures (Annual Statistical Reporting, ASR). For these reporting exercises, country operations compile aggregate population figures from a range of sources and data producers such as governments, UNHCR’s own refugee registration database proGres and sometimes non-governmental actors. The figures undergo a statistical quality control process at the country, regional and global level of the organisation and are disseminated on the publicly available refugee data finder (https://www.unhcr.org/refugee-statistics/) after undergoing a light statistical disclosure control process to suppress very small counts of persons that could identify individuals.

The end-year figures compiled with reporting date 31 December contain sex- and age breakdowns of the stocks of displaced and stateless people under UNHCR’s mandate. Table @ref(tab:demref2020)) displays the sex- and age-disaggregated data on the stock of refugees under UNHCR’s mandate (including Venezuelans displaced abroad, excluding Palestine refugees under UNRWA’s mandate). The data is available on sub-national level as indicated by the variables location and urbanRural. Variable statelessStatus displays whether the reported population is stateless (“STL” and “UDN”) or not stateless (“NSL”). The variables [sex]_[agebracket] contain the counts of refugees as of 31 December 2020 in the individual sex and age brackets in the respective geographic/stateless combination. For example, female_12_17 contains the number of female refugees aged 12 to 17. Variable totalEndYear is the total number of refugees over all sex/age categories.

demref2020 %>% 
  select(asylum_country, origin_country, location, urbanRural, statelessStatus, female_0_4:totalEndYear, typeOfDisaggregation, typeOfDisaggregationBroad)

Pre-defined sex-specific age brackets are 0-4, 5-11, 12-17, 18-24, 25-49, 50-59 and 60 and older. For some population groups, data is only available for the overall 18-59 age group instead of for the finer brackets in this age range. For others, only sex-disaggregated data without age information is available, and finally there are population groups for which only the total end-year count without any demographic information is available. These different levels of disaggregated data availability is recorded in variable typeOfDisaggregation in the dataset above: “Sex/Age fine” for the most granular age brackets, “Sex/Age broad” for populations reported with the 18-59 age bracket, “Sex” where only counts of female and male refugees are available without age information and “None” for populations without any available demographic information.

Distribution of missing data

t.typeOfDisaggregation

Table @ref(tab:t-typeOfDisaggregation)) shows numbers of refugees and countries of asylum for which sex- and age-disaggregated data is available, including the respective proportions out of total refugees and countries of asylum. Age- and sex-disaggregated data is available for 74.9 per cent of the global refugee population and data disaggregated only by sex for a further 3.5 per cent.

By region of asylum

Demographic disaggregation coverage by region of asylum

Demographic disaggregation coverage by region of asylum

Globally - % per age/sex cat. - % age missing - % age and sex missing

By region of origin

Demographic disaggregation coverage by region of origin

Demographic disaggregation coverage by region of origin

By CoO - % per age/sex cat. - % age missing - % age and sex missing

Demographic disaggregation coverage by origin country and asylum region

Demographic disaggregation coverage by origin country and asylum region

2020 end-year refugee/Venezuelan population by origin country and asylum region

2020 end-year refugee/Venezuelan population by origin country and asylum region

By CoA - % per age/sex cat. - % age missing - % age and sex missing

Discuss types and reasons for missingness (NMAR) and outline modelling approach to overcome. Why is using available data so bad?